Site Reliability Engineer (SRE)
Responsibilities:
1. Manage and operate cloud infrastructure across AWS or Azure and Kubernetes environments, ensuring optimal performance and resource allocation.
2. Develop and maintain automated systems to monitor, build, and scale environments, ensuring stable and reliable operations.
3. Perform capacity planning and implement cost controls to maximize service availability and efficiency across cloud resources.
Qualifications:
1. Bachelor’s degree in Computer Science or a related field, with 2+ years of experience in Site Reliability Engineering or a similar role in internet or cloud-based environments.
2. Hands-on experience with AWS or Azure and Kubernetes, with strong proficiency in Linux, networking, and storage concepts.
3. Solid experience in database and message queue operations, particularly with Kafka, MySQL, MongoDB, or Redis.
4. Proficiency in CI/CD processes with Java and Golang, and troubleshooting production issues is a plus.
5. Experience in one or more programming languages such as Shell, Python, Java, Golang, or C. 6. Strong problem-solving skills, sense of ownership, and excellent communication abilities.